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Abstract 



A review of procedures for computerized assembly of linear, sequential, and adaptive 
tests is given. The common approach to these test assembly problems is that they are viewed as 
instances of constrained combinatorial optimization. For each testing format several potentially 
useful objective functions and types of constraints are discussed. 
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Introduction 



Like many other areas of psychology, the availability of cheap plentiful computational 
power has revolutionized the technology of educational and psychological testing. It is no 
longer necessary to restrict testing to the use of items with a paper-and-pencil format in a group- 
based session. We now have the possibility to build multimedia testing environments to which 
test takers respond by manipulating objects on a screen, working with application programs, or 
manipulating devices with built-in sensors. Moreover, such tests can be assembled from banks 
with items stored in computer memory and delivered immediately to examinees who walk in 
when they are ready to take the test test. 

Computerized assembly of tests from an item bank is treated as an optimization problem 
with a solution that has to satisfy a potentially long list of statistical and nonstatistical 
specifications for the test. The general nature of this optimization problem is outlined, and 
applications to the problems of assembling tests with a linear, sequential, and adaptive format 
are reviewed. 



The formal structure of a test assembly problem is known as a constrained combinatorial 
optimization problem. It is an optimization problem because the test should be assembled to 
be best in some sense. The problem is combinatorial because the test is a combination of 
items from the bank and optimization is over the space of admissible combinations. Finally, 
the problem is constrained because only those combinations of items that satisfy the list of test 
specifications are admissible. 

The quintessential combinatorial optimization problem is the knapsack problem 
(Nemhauser & Wolsey, 1988). Suppose a knapsack has to be filled from a set of items indexed 
by i =\,. .., /. Each item has utility Ui and weight Wi. The optimal combination of items is 
required to have maximum utility but should not exceed weight limit W. The combination is 
found defining decision variables. 



Test Assembly as an Optimization Problem 
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and solving the problem 



i 

max UiXi, (maximum utility) 

i=l 



subject to 



i 

^2 w i x i < W, (weight limit) 

Z=1 

Xi = 0, 1. (range of variables) 



for an optimal set of values for variables 

Problems with this structure are known as 0-1 linear programming (LP) problems (e.g., 
see Linear and Nonlinear Programming) . Several test assembly problems can be formulated as 
a 0-1 LP problem; others need integer variables or a combination of integer and real variables. 
In a typical testassembly model, the objective function is used to maximize a statistical attribute 
of the test whereas the constraints serve to guarantee its content validity. 



Objective Functions in Test Assembly Problems 

Suppose the items in the bank are calibrated using an itme response theory DRTmodel, for 
example, the two-parameter logistic (2PL) model: 



Pi(0) = Pr (Ut = 1 | 6) = 



exp[flj(fl ~ bj ) ] 

1 + exp[ai(0 - &;)]’ 



( 2 ) 



where 6 6 (— oo, oo) is the ability of the examinee, and € (— oo, oo) and a; € [0, oo) are 
parameters for the difficulty and discriminating power of item i, respectively (e g., see Factor 
Analysis and Latent Structure: IRT and Rasch Models). 

A common objective function in IRT-based test assemblyis based on the test information 
function , which is Fisher’s measure of information on the unknown ability 6 in the response 
vector XJ \, ..., U n , where n is the number of items in the test. For the 2-PL model the test 
information function is given by 

I /m V-' \Pi(6)Y 




with p'(0) = Jj jPi(0). Test information functions are additive in the contributions by the 
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individual items, which are denoted by h{9). 

The first step in IRT-based test assembly is to formulate a target for the test information 
function . The next step is to assemble the test to have its actual information function as closely 
as possible to the target. Examples of popular targets are a uniform function over an ability 
interval in diagnostic testing and a peaked function at a cutoff score 9 C in admission testing. An 
objective function to realize the former is presented in (4)-(5) below. The latter can be realized 
using objective function 



n 

ma x'}Tl i (9 c )x i , (3) 

i—1 

under the condition of an appropriate set of constraints on the test. Other possible objective 
functions are maximization of classical test reliability and minimization of the length of the 
test. For a review of these and other examples, see van der Linden (1998). 

Constraints in Test Assembly Problems 

Formally, test specifications can be viewed as a series of upper and/or lower bounds on 
numbers of item attributes in the tests or on functions thereof. An important distinction is 
between constraints on (1) categortical item attributes, (2) quantitative item attribtues, and (3) 
logicaL relations between the items in the test. Categorical attributes are attributes such as 
item content, cognitive level, format, and use of graphics. Examples of quantitative attributes 
are statistical item parameters, expected response times, word counts, and readability indices. 
Logical (or Boolean) constraints deal with such issues in test assembly as items that can 
not figure in the same test because they have clues to each other’s solution or items that are 
organized as sets around common stimuli. 

Let Vj be a set of items in the bank with a common value for an attribute. The general 
shape of a constraint on a categorical item attribute in a test assembly model with 0-1 decision 
variables is 



ieVj 

where rij is a bound on the number of items from Vj. This type of constraint can also be 
formulated on intersections or unions of sets of items. 

Constraints on quantitative attributes are typically on a function of their values for a set 
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of items. For example, if a typical test taker has response time U on item i and the total testing 
time available is T (both in seconds), a useful constraint on the test is: 

i 

Y Uxi < T. 

i = 1 

As an example of a logical constraint, suppose that W s represents a set of items in the 
bank with a common stimulus s, and that n s items have to be selected if and only if stimulus s 
has. This requirement leads to the following constraint 



^ ~' j Xi — ri s z s , 
iev a 



with z s being an auxiliary 0-1 decision variable for the selection of stimulus s. 

In a full-fledged test assembly problem, constraints may also be needed to deal, for 
instance, with stimulus attributes or with relations between different test forms if a set of forms 
is to be assembled simultaneously. For a review of these and other types of constraints, see van 
der Linden (1998; 2000a). 



Linear-Test Assembly 

Linear tests have a fixed number of items presented in a fixed order. For measurement 
over a larger ability interval, it is customary to choose a discrete set of target values for the 
information function, T(6 k ), k=l,...,K. In practice, because information functions are well- 
behaved continuous functions, target values at three to five equally-spaced 6 values suffice. 
The need to match more than one target value simultaneously creates a multi objective decision 
problem . 

An effective way to deal with multiple target values is to apply a maximin criterion. This 
criterion leads to the following core of a test assembly model 

max y (common factor) (4) 

subject to 

i 

h{Qk)xi > T(6 k )y k = 1, K, (minimum information at 6 k ) (5) 

i=\ 



BEST COPY AVAILABLE 



Computerized Test Construction - 7 



where y is a common factor in the right-hand side bounds in (5) that is maximized and 
coefficients T(6k) control the shape of the information function (van der Linden & Boekkooi- 
Timminga, 1989). Constraints to deal with the remaining content specifications should be 
added to this model. For a large-scale testing program in education, it is not unusual to have 
hundreds of those constraints. 

Methods to solve test assembly models can be distinguished into algorithms that have 
been proven to lead to optimality and intuitively plausible heuristics, which typically select one 
item at a time. Well-known heuristics are those that pick the items with the largest impact on 
the test information function (Luecht 1998) or with the smallest weighted average deviation 
from all bounds in the model (Swanson & Stocking, 1993). Optimal solutions can be found 
using a branch-and-bound algorithm (Nemhauser & Wolsey, 1988), or, if the structure of the 
models boils down to a network-flow problem, a simplified version of the simplex algorithm 
(Armstrong et al. 1 995). Several algorithms and heuristics are implemented in the test assembly 
package ConTest (Timminga, van der Linden & Schweizer, 1996). 

Sometimes it is necessary to build a set of linear test forms, for instance, parallel forms 
to support different testing sessions or forms of different difficulties for use in an evaluation 
study with a pretest-posttest design. Sequential application of a model of linear-test assembly 
is bound to show a decreasing quality of the solutions. A simultaneous approach balancing the 
quality of the individual test forms can be obtained by replacing the decision variables in (1) by 

{ 1 if item i is selected for f, 

( 6 ) 

0 otherwise, 

using these variables to model the test specifications for all forms and solving 

the model for all variables simultaneously. If no overlap between forms is allowed, logical 
constraints must be added to the model to prevent the variables from taking the value of 
one more than once. Efficient combinations of sequential and simultaneous approaches are 
presented in van der Linden and Adema (1998). 

Sequential Test Assembly 

Sequential test assembly is used in testing for selection or mastery with a cutoff score on 
the test that represents the level beyond which the test taker is accepted or considered to master 
the domain of knowledge tested, respectively. An obvious linear approach is to assemble a 
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test using the objective function in (3), but a more efficient procedure is to assemble the test 
sequentially, sampling one item from the bank at a time and stopping when the test taker is 
classified with enough precision. 

If the items are dichotomous, the number-correct score of a test taker follows a bonomial 
distibutions 

Pv(X = x)= QjTT^i-TT)"-*, 



with 7r being the success parameter and n the (random) number of items sampled. In a 
sequential probability ratio test (SPRT) for the decision to reject test takers with tx < 7r 0 and 
accept those with tx > tx \, with (7T 0 , being a small interval around cutoff score tx c , the 
decision rule is based on the likelihood ratio 

n 

An =n/(* 1 7ri )/^( xi i *»). 

t=i 



If A n < A or A n > B the decision is to reject or accept, respectively, whereas sampling of 
items is continued otherwise. The constants A and B are known to satisfy 



A > P (* i )/{1 - a(7r 0 )}, 
B < { 1 - p{TX 0 )}/a(Tx 0 ), 



(7) 



with o;(7ro) and /3(txi) being the probabilities of a false positive and false negative decision 
for test takers with tx = tx 0 and tx = tx\, respectively. For more on sequential methods, see 
Sequential Statistical Methods. 

Alternative sequential approaches to test assembly follow an IRT-based SPRT (Reckase, 
1983) or a Bayesian framework (Kingsbury & Weiss, 1983). Sequential Bayesian methods are 
further explained in Bayesian Decision Theory. 



Adaptive Test Assembly 

If the items in the pool are calibrated using a model as in (2), adaptive test assembly 
becomes possible. In adaptive test assembly, the items are selected to be optimal at the ability 
estimate of the test taker, updated by the computer after each new response. Adaptive testing 
leads to much shorter tests; savings are typically over 50% relative to a linear test with the same 
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precision. 

To show the principle of adaptive testing, let i = 1 denote the items in the pool 
and k — 1 , K the items the test. It follows that i k is the index of the item in the pool 
administered as the fcth item in the test. The set S k = {ii, contains the first A: — 1 

items in the test. These items involve responses variables Ufc_i = (U ^ , The update 
of the ability estimate after k — 1 responses is denoted as 0 Ufc _, . Item k in the test is selected to 
be optimal at 0 Ufc _, among the items in the set R k = { 1, I}\S k - j. 

A popular criterion of optimality in adaptive testing is maximization of information at the 
current ability estimate, that is, selection of item i k according to objective function 

max (8) 

teitfc 

Alternative objective functions are based on Kullback-Leibler information or on Bayesian 
criteria that use the posterior distribution of 0 after A: — 1 items. These functions are reviewed 
in van der Linden and Pashley (2000). 

Several procedures have been suggested to realize content constraints on adaptive tests. 
The four major approaches are: (1) partitioning the bank according to the main item attributes 
and spiraling item selection among the classes in the partition to realize a desired content 
distribution; (2) building deviations from content constraints into the objective function 
(Swanson & Stocking, 1993); (3) testing from a pool with small sets of items built according 
to content specifications; and (4) using a shadow test approach in which prior to each item 
a full linear tests is assembled that contains all previous items, meets all content constraints, 
and is optimal at the ability estimate, and from which the most informative item is selected for 
administration (van der Linden, 2000b). 

Adaptive testing is currently one of the dominant modes of computerized testing. Several 
aspects of computerized adaptive testing not addressed in this entry are reviewed in Wainer 
(1990). 
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